Creating hidden Markov models for fast speech

نویسندگان

  • Thilo Pfau
  • Günther Ruske
چکیده

This paper deals with the problem of building HMMs suitable for fast speech. Fast speech leads to increased error rates on various tasks. In the first part of the paper an automatic procedure is presented to split speech material into different categories according to the speaking rate, which is fundamental for all investigations on the speaking rate. In the second part the problem of sparse data available for the estimation of HMMs for fast speech is discussed. A comparison of different methods to overcome this problem follows. The main emphasis here is set on robust reestimation techniques like maximum aposteriori estimation (MAP) as well as on methods to reduce the variability of the speech signal and therefore to be able to reduce the number of HMM parameters. Vocaltract length normalization (VTLN) is chosen for that purpose. In the last part a comparison of various combinations of the methods discussed is presented basing on error rates for continuous speech recognition on fast speech. The best method (VTLN followed by MAP reestimation) results in an overall decrease of the error rate of 10% relative to the baseline system.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speech enhancement based on hidden Markov model using sparse code shrinkage

This paper presents a new hidden Markov model-based (HMM-based) speech enhancement framework based on the independent component analysis (ICA). We propose analytical procedures for training clean speech and noise models by the Baum re-estimation algorithm and present a Maximum a posterior (MAP) estimator based on Laplace-Gaussian (for clean speech and noise respectively) combination in the HMM ...

متن کامل

Creating hidden Markov models for fast speech by optimized clustering

Previous studies have shown that the recognition accu racy often severely degrades at higher speech rates which can basically be traced back to two main dimensions acoustic and phonemic Reasons for this e ect can be found in the phonemic eld e g elisions as well as on the acoustic level with increasing rates of speech the spec tral characteristics are changing A main obstacle in this context is...

متن کامل

Speaker normalization and pronunciation variant modeling: helpful methods for improving recognition of fast speech

The presented paper addresses the problem of creating hidden Markov models for fast speech. The major issues discussed are robust parameter estimation and reducing within-model variations. Regarding the first issue, the use of the maximum a posteriori parameter estimation is discussed. To reduce within-model variations, a maximum likelihood based vocal tract length normalization procedure and a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998